A Markov Language Learning Model for Finite Parameter Spaces
نویسندگان
چکیده
This paper shows how to formally characterize language learning in a finite parameter space as a Markov structure, hnportant new language learning results follow directly: explicitly calculated sample complexity learning times under different input distribution assumptions (including CHILDES database language input) and learning regimes. We also briefly describe a new way to formally model (rapid) diachronic syntax change. B A C K G R O U N D M O T I V A T I O N : T R I G G E R S A N D L A N G U A G E A C Q U I S I T I O N Recently, several researchers, including Gibson and Wexler (1994), henceforth GW, Dresher and Kaye (1990); and Clark and Roberts (1993) have modeled language learning in a (finite) space whose grammars are characterized by a finite number of parameters or nlength Boolean-valued vectors. Many current linguistic theories now employ such parametric models explicitly or in spirit, including Lexical-Functional Grammar and versions of HPSG, besides GB variants. With all such models, key questions about sample complexity, convergence time, and alternative modeling assumptions are difficult to assess without a precise mathematical formalization. Previous research has usually addressed only the question of convergence in the limit without probing the equally important question of sample complexity: it is of not much use that a learner can acquire a language if sample complexity is extraordinarily high, hence psychologically implausible. This remains a relatively undeveloped area of language learning theory. The current paper aims to fill that gap. We choose as a starting point the GW Triggering Learning Algorithm (TLA). Our central result is that the performance of this algorithm and others like it is completely modeled by a Markov chain. We explore the basic computational consequences of this, including some surprising results about sample complexity and convergence time, the dominance of random walk over gradient ascent, and the applicability of these results to actual child language acquisition and possibly language change. B a c k g r o u n d . Following Gold (1967) the basic framework is that of identification in the limit. We assume some familiarity with Gold's assumptions. The learner receives an (infinite) sequence of (positive) example sentences from some target language. After each, the learner either (i) stays in the same state; or (ii) moves to a new state (change its parameter settings). If after some finite number of examples the learner converges to the correct target language and never changes its guess, then it has correctly identified the target language in the limit; otherwise, it fails. In the GW model (and others) the learner obeys two additional fundamental constraints: (1) the single.value constraint--the learner can change only 1 parameter value each step; and (2) the greediness constraint--if the learner is given a positive example it cannot recognize and changes one parameter value, finding that it can accept the example, then the learner retains that new value. The TLA essentially simulates this; see Gibson and Wexler (1994) for details. THE MARKOV FORMULATION Previous parameter models leave open key questions addressable by a more precise formalization as a Markov chain. The correspondence is direct. Each point i in the Markov space is a possible parameter setting. Transitions between states stand for probabilities b that the learner will move from hypothesis state i to state j . As we show below, given a distribution over L(G), we can calculate the actual b's themselves. Thus, we can picture the TLA learning space as a directed, labeled graph V with 2 n vertices. See figure 1 for an example in a 3-parameter system. 1 We can now use Markov theory to describe TLA parameter spaces, as in lsaacson and 1GW construct an identical transition diagram in the description of their computer program for calculating local maxima. However, this diagram is not explicitly presented as a Markov structure and does not include transition probabilities.
منابع مشابه
A language learning model for finite parameter spaces.
This paper shows how to formally characterize language learning in a finite parameter space, for instance, in the principles-and-parameters approach to language, as a Markov structure. New language learning results follow directly; we can explicitly calculate how many positive examples on average ("sample complexity") it will take for a learner to correctly identify a target language with high ...
متن کاملRemarks and Replies Learning from Triggers
In this article we provide a refmed analysis of learning in finite parameter spaces using the Triggering Learning Algorithm (TLA) of Gibson and Wexler (1994). We show that the behavior of the TLA can be modeled exactly as a Markov chain. This Markov model allows us to (1) describe formally the conditions for learnability in such spaces, (2) uncover problematic states in addition to the local ma...
متن کاملLearning Bayesian Network Structure using Markov Blanket in K2 Algorithm
A Bayesian network is a graphical model that represents a set of random variables and their causal relationship via a Directed Acyclic Graph (DAG). There are basically two methods used for learning Bayesian network: parameter-learning and structure-learning. One of the most effective structure-learning methods is K2 algorithm. Because the performance of the K2 algorithm depends on node...
متن کاملMAN-MACHINE INTERACTION SYSTEM FOR SUBJECT INDEPENDENT SIGN LANGUAGE RECOGNITION USING FUZZY HIDDEN MARKOV MODEL
Sign language recognition has spawned more and more interest in human–computer interaction society. The major challenge that SLR recognition faces now is developing methods that will scale well with increasing vocabulary size with a limited set of training data for the signer independent application. The automatic SLR based on hidden Markov models (HMMs) is very sensitive to gesture's shape inf...
متن کاملMinimum classification error training of hidden Markov models for acoustic language identification
The goal of acoustic Language Identification (LID) is to identify the language of spoken utterances. The described system is based on parallel Hidden Markov Model (HMM) phoneme recognizers. The standard approach for parameter learning of Hidden Markov Model parameters is Maximum Likelihood (ML) estimation which is not directly related to the classification error rate. Based on the Minimum Class...
متن کامل